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Abstract — Traditional  biometric  recognition  systems  rely  on  a 
single  biometric  signature  for  authentication.  While  the  advan¬ 
tage  of  using  multiple  sources  of  information  for  establishing  the 
identity  has  been  widely  recognized,  computational  models  for 
multimodal  biometrics  recognition  have  only  recently  received 
attention.  We  propose  a  novel  multimodal  multivariate  sparse 
representation  method  for  multimodal  biometrics  recognition, 
which  represents  the  test  data  by  a  sparse  linear  combination  of 
training  data,  while  constraining  the  observations  from  different 
modalities  of  the  test  subject  to  share  their  sparse  representations. 
Thus,  we  simultaneously  take  into  account  correlations  as  well  as 
coupling  information  between  biometric  modalities.  We  modify 
our  model  so  that  it  is  robust  to  noise  and  occlusion.  A  multimodal 
quality  measure  is  also  proposed  to  weigh  each  modality  as  it  gets 
fused.  Furthermore,  we  also  kernelize  the  algorithm  to  handle 
non-linearity  in  data.  The  optimization  problem  is  solved  using  an 
efficient  alternative  direction  method.  Various  experiments  show 
that  our  method  compares  favorably  with  competing  fusion-based 
methods. 

Index  Terms — Multimodal  biometrics,  feature  fusion,  sparse 
representation. 

I.  Introduction 

Unimodal  biometric  systems  rely  on  a  single  source  of 
information  such  as  a  single  iris  or  fingerprint  or  face  for 
authentication  [1],  Unfortunately  these  systems  have  to  deal 
with  some  of  the  following  inevitable  problems  [2]:  (a)  Noisy 
data:  poor  lighting  of  a  user’s  face  or  an  iris  image  with  contact 
lense  are  examples  of  noisy  data,  (b)  Non-universality:  the 
biometric  system  based  on  a  single  source  of  evidence  may 
not  be  able  to  capture  meaningful  data  from  some  users.  For 
instance,  an  iris  biometric  system  may  extract  incorrect  texture 
patterns  from  the  iris  of  certain  users  due  to  the  presence 
of  contact  lenses,  (c)  Intra-class  variations:  in  the  case  of 
fingerprint  recognition  wrinkles  due  to  wet  fingers  [3]  can 
cause  these  variations.  These  types  of  variations  often  occur 
when  a  user  incorrectly  interacts  with  the  sensor,  (d)  Spoof 
attack:  hand  signature  forgery  is  an  example  of  this  type  of 
attack.  It  has  been  observed  that  some  of  the  limitations  of 
unimodal  biometric  systems  can  be  addressed  by  deploying 
multimodal  biometric  systems  that  essentially  integrate  the 
evidence  presented  by  multiple  sources  of  information  such 
as  iris,  fingerprints  and  face.  Such  systems  are  less  vulnerable 
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to  spoof  attacks  as  it  would  be  difficult  for  an  imposter  to 
simultaneously  spoof  multiple  biometric  traits  of  a  genuine 
user.  Due  to  sufficient  population  coverage,  these  systems  are 
able  to  address  the  problem  of  non-universality. 

Classification  in  multibiometric  systems  is  done  by  fus¬ 
ing  information  from  different  biometric  modalities.  The 
information  fusion  can  be  done  at  different  levels,  which 
can  be  broadly  divided  into  feature  level,  score  level  and 
rank/decision  level  fusion.  Due  to  preservation  of  raw  infor¬ 
mation,  feature  level  fusion  can  be  more  discriminative  than 
score  or  decision  level  fusion  [4].  But,  there  have  been  very 
little  effort  in  exploring  feature  level  fusion  in  the  biometric 
community.  This  is  because  of  the  different  output  formats 
of  different  sensors,  which  result  in  features  with  different 
dimensions.  Often  the  features  have  large  dimensions,  and 
fusion  becomes  difficult  at  feature  level.  The  prevelant  method 
is  feature  concatenation,  which  has  been  used  for  different 
multibiometric  settings  [5]— [7] .  However,  in  many  scenarios, 
each  modality  produces  high-dimensional  features.  In  such 
cases,  the  method  is  both  impractical  and  non-robust.  It 
also  cannot  exploit  the  constraint  that  features  of  different 
modalities  should  share  the  same  identity. 

In  recent  years,  theories  of  Sparse  Representation  (SR)  and 
Compressed  Sensing  (CS)  have  emerged  as  powerful  tools  for 
efficient  processing  of  data  in  non-traditional  ways  [8].  This 
has  led  to  a  resurgence  in  interest  in  the  principles  of  SR  and 
CS  for  biometrics  recognition  [9].  Wright  el  al.  [10]  proposed 
a  robust  sparse  representation-based  classification  (SRC)  algo¬ 
rithm  for  face  recognition.  It  was  shown  that  by  exploiting  the 
inherent  sparsity  of  data,  one  can  obtain  improved  recognition 
performance  over  traditional  methods  especially  when  the  data 
is  contaminated  by  various  artifacts  such  as  illumination  vari¬ 
ations,  disguise,  occlusion  and  random  pixel  corruption.  Pillai 
et  al.  extended  this  work  for  robust  cancelable  iris  recognition 
in  [11].  Nagesh  and  Li  [12]  presented  an  expression-invariant 
face  recognition  method  using  distributed  compressed  sensing 
and  joint  sparsity  models.  Patel  et  al.  [13]  proposed  dictionary 
based  method  for  face  recognition  under  varying  pose  and 
illumination.  A  discriminative  dictionary  learning  method  for 
face  recognition  was  also  proposed  by  Zhang  and  Li  [14].  For 
a  survey  of  applications  of  SR  and  CS  algorithms  to  biometric 
recognition,  see  [8],  [9],  [15],  [16]  and  the  references  therein. 

Motivated  by  the  success  of  SR  in  unimodal  biometric 
recognition,  we  propose  a  joint  sparsity-based  algorithm  for 
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Output  class  =  arg  min  ||  Y  t  —  Xct  Wt\\^ 

c=l,...,C  i=1 


Fig.  1:  Overview  of  our  algorithm. 


multimodal  biometrics  recognition  1 .  Figure  1  presents  an 
overview  of  our  framework.  It  is  based  on  the  well  known 
regularized  regression  method,  multi-task  multivariate  Lasso 
[18],  [19].  Our  method  imposes  common  sparsities  both 
within  each  biometric  modality  and  across  different  modalities. 
Furthermore,  we  extend  our  model  so  that  it  can  deal  with 
both  occlusion  and  noise.  Note  that  our  method  is  different 
from  some  of  the  previously  proposed  classification  algorithms 
based  on  joint  sparse  representation.  Yuan  and  Yan  [20] 
proposed  a  multi  task  sparse  linear  regression  model  for  image 
classification.  This  method  uses  group  sparsity  to  combine 
different  features  of  an  object  for  classification.  Zhang  et  al. 
[21]  proposed  a  joint  dynamic  sparse  representation  model  for 
object  recognition.  Their  essential  goal  was  to  recognize  the 
same  object  viewed  from  multiple  observations  i.e.,  different 
poses.  Our  method  is  more  general  in  that  it  does  not  only 
considers  multivariate  sparse  representations  but  it  can  also 
deal  with  multitask  multivariate  sparse  representations  which 
are  natural  in  multimodal  biometrics.  One  of  the  key  features 
of  our  model  is  that  it  can  deal  with  both  occlusion  and 
noise.  Furthermore,  using  kernel  methods,  we  present  non¬ 
linear  extensions  of  our  joint  sparse  representation  method. 

This  paper  makes  the  following  contributions: 

•  We  present  a  robust  feature  level  fusion  algorithm  for 
multibiometric  recognition.  Through  the  proposed  joint 
sparse  framework,  we  can  easily  handle  different  dimen¬ 
sions  of  different  modalities  by  forcing  different  features 
to  interact  through  their  sparse  coefficients.  Furthermore, 
the  proposed  algorithm  can  efficiently  handle  large  di¬ 
mensional  feature  vectors. 

A  preliminary  version  of  this  work  appeared  in  [17], 


•  We  make  the  classification  robust  to  occlusion  and  noise 
by  introducing  an  error  term  into  the  optimization  frame¬ 
work. 

•  The  algorithm  is  easily  generalizable  to  handle  multiple 
test  inputs  from  a  modality. 

•  We  introduce  a  quality  measure  for  multimodal  fusion 
based  on  joint  sparse  representation. 

•  Lastly,  we  kernelize  the  algorithm  to  handle  non-linearity 
in  data  samples. 

A.  Paper  Organization 

The  paper  is  organized  in  five  sections.  In  section  II,  we 
describe  the  proposed  sparsity-based  multimodal  recognition 
algorithm  which  is  extended  to  kernel  case  in  section  IV.  The 
quality  measure  is  described  in  III.  Experimental  evaluations 
on  a  comprehensive  multimodal  dataset  and  a  face  database 
have  been  described  in  section  V.  Finally,  in  section  VI,  we 
describe  the  computational  complexity.  Concluding  remarks 
are  presented  in  section  VII. 

II.  Joint  sparsity-based  multimodal  biometrics 

RECOGNITION 

Consider  a  multimodal  C-class  classification  problem  with 
D  different  biometric  traits.  Suppose  there  are  p,  training 
samples  in  each  biometric  trait.  For  each  biometric  trait 
i  =  1, . . . ,  D,  we  denote 

x‘  =  [xi,x‘,...,xy 

as  an  nt  x  p7  dictionary  of  training  samples  consisting  of  C 
sub-dictionaries  Xj.’s  corresponding  to  C  different  classes. 
Each  sub-dictionary 
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represents  a  set  of  training  data  from  the  2th  modality  labeled 
with  the  jth  class.  Note  that  rii  is  the  feature  dimension  of  each 
sample  and  there  are  pj  number  of  training  samples  in  class  j. 
Hence,  there  are  a  total  of  p  =  Y^j=iPj  many  samples  in  the 
dictionary  X^.  Elements  of  the  dictionary  are  often  referred 
to  as  atoms.  In  multimodal  biometrics  recognition  problem, 
given  a  test  samples  (matrix)  Y,  which  consists  of  D  different 
modalities  {Y1,  Y2 3, . . . ,  Y12}  where  each  sample  Y*  consists 
of  di  observations  Y*  =  [yj,y2,  •  •  •  ,y^]  €  Mnxdi,  the  objec¬ 
tive  is  to  identify  the  class  to  which  a  test  sample  Y  belongs 
to.  In  what  follows,  we  present  a  multimodal  multivariate 
sparse  representation-based  algorithm  for  this  problem  [18], 
[19],  [22], 

A.  Multimodal  multivariate  sparse  representation 

We  want  to  exploit  the  joint  sparsity  of  coefficients  from 
different  biometrics  modalities  to  make  a  joint  decision.  To 
simplify  this  model,  let  us  consider  a  bimodal  classification 
problem  where  the  test  sample  Y  =  [Y1,  Y2]  consists  of  two 
different  modalities  such  as  iris  and  face.  Suppose  that  Y1 
belongs  to  the  jth  class.  Then,  it  can  be  reconstructed  by  a 
linear  combination  of  the  atoms  in  the  sub-dictionary  Xj.  That 
is,  Y1  =  X1T1  +  N1,  where  T1  is  a  sparse  matrix  with  only 
Pj  nonzero  rows  associated  with  the  jth  class  and  N1  is  the 
noise  matrix.  Similarly,  since  Y2  represents  the  same  subject, 
it  belongs  to  the  same  class  and  can  be  represented  by  training 
samples  in  X2  with  different  set  of  coefficients  T2.  Thus,  we 
can  write  Y2  =  X2T2  +N2,  where  T2  is  a  sparse  matrix  that 
has  the  same  sparsity  pattern  as  T1.  If  we  let  T  =  | T 1 .  T2], 
then  T  is  a  sparse  matrix  with  only  pj  nonzeros  rows. 

In  the  more  general  case  where  we  have  D  modalities, 
if  we  denote  {Y1}®  j  as  a  set  of  D  observations  each 
consisting  of  di  samples  from  each  modality  and  let  T  = 
[T1,  T2, . . . ,  rD]  e  M.pxd  be  the  matrix  formed  by  concate¬ 
nating  the  coefficient  matrices  with  d  =  di,  then  we 

can  seek  for  the  row-sparse  matrix  T  by  solving  the  following 
l\/lq -regularized  least  square  problem 

1  D 

T  =  argmin  -  £  ||Yl  -  XT‘|||  +  A||r||li?  (1) 

2—1 

where  A  is  a  positive  parameter  and  q  is  set  greater  than  1  to 
make  the  optimization  problem  convex.  Here,  || F|| i  q  is  a  norm 
defined  as  ||r||iig  =  J^fc=i  IIt^II?  where  jk,s  are  the  row 
vectors  of  T  and  |  Y 1 1  is  the  Frobenius  norm  of  the  matrix 
Y  defined  as  ||Y||f  =  Once  T  is  obtained,  the 

class  label  associated  with  an  observed  vector  is  then  declared 
as  the  one  that  produces  the  smallest  approximation  error. 

D 

j  =  argmin^  ||Y*  -  Xi5‘.(ri)|||.,  (2) 

3  i=l 

where  S‘  is  the  matrix  indicator  function  defined  by  keeping 
rows  corresponding  to  the  jth  class  and  setting  all  other  rows 
equal  to  zero.  Note  that  the  optimization  problem  (1)  reduces 
to  the  conventional  Lasso  [23]  when  D  =  1  and  d  =  1.  In 
the  case,  when  D  =  1  (1)  is  referred  to  as  multivariate  Lasso 
[18]. 


B.  Robust  multimodal  multivariate  sparse  representation 

In  this  section,  we  consider  a  more  general  problem  where 
the  data  is  contaminated  by  noise.  In  this  case,  the  observation 
model  can  be  modeled  as 

Y*  =  XT’  +  Z*  +  N\  i  =  l,...D,  (3) 

where  N'  is  a  small  dense  additive  noise  and  Z*  €  M" x  di 
is  a  matrix  of  background  noise  (occlusion)  with  arbitrarily 
large  magnitude.  One  can  assume  that  each  Z1  is  sparsely 
represented  in  some  basis  B*  €  RnXm\  That  is,  Z*  =  B*A* 
for  some  sparse  matrices  A*  £  Hence,  (3)  can  be 

rewritten  as 

Y?:  =  XT*  +  B!A*  +N\  i  =  l,...D,  (4) 

With  this  model,  one  can  simultaneously  recover  the  coef¬ 
ficients  r  and  A*  by  taking  advantage  of  the  fact  that  A1  are 
sparse 

1  D 

f ,  A  =  argmin  -  £  ||Y*  -  XT  -  B‘A'\\2F  + 

’  ^  i—  1 

A1||r||1,9  +  A2||A||1,  (5) 

where  Ai  and  A2  are  positive  parameters  and  A  = 
[A1,  A2, . . . ,  Ad]  is  the  sparse  coefficient  matrix  correspond¬ 
ing  to  occlusion.  The  fi-norm  of  matrix  A  is  defined  as 
||  A|| i  =  X!,;  j  |Ai,7'|.  Note  that  the  idea  of  exploiting  the 
sparsity  of  occlusion  term  has  been  studied  by  Wright  et  al. 
[10]  and  Candes  et  al.  [24]. 

Once  T,A  are  computed,  the  effect  of  occlusion  can  be 
removed  by  setting  Y?  =  Y'  —  B' A'.  One  can  then  declare 
the  class  label  associated  to  an  observed  vector  as 

D 

j  =  argmin ^  ||Yl  -  X^F)  -  BiAi|||.  (6) 

3  i= l 

C.  Optimization  algorithm 

In  this  section,  we  present  an  algorithm  to  solve  (5)  based 
on  the  classical  alternating  direction  method  of  multipliers 
(ADMM)  [25],  [26].  Note  that  the  optimization  problem  (1) 
can  be  solved  by  setting  A2  equal  to  zero.  Let 

1  D 

C(r,A)  =  -  ^  || Y*  —  XT  —  B®Al|||i. 

i=l 

Then,  our  goal  is  to  solve  the  following  optimization  problem 

minC(r,A)  +  A1||r||li?  +  A2||A||1.  (7) 

In  ADMM  the  idea  is  to  decouple  C(T,  A),  ||r||ii9  and  ||A||i 
by  introducing  auxiliary  variables  to  reformulate  the  problem 
into  a  constrained  optimization  problem 

min  C(r,A)+Ai||V||1,g  +  A2||U||1  s.  t. 
r,A,u,v  v 

r  =  V,A  =  U.  (8) 

Since,  (8)  is  an  equally  constrained  problem,  the  Augmented 
Lagrangian  method  (ALM)  [25]  can  be  used  to  solve  the 
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problem.  This  can  be  done  by  minimizing  the  augmented 
lagrangian  function  /ar,aA(r,  A,  V,  U;  Aa,  Ar)  defined  as 

C(r,  A)  +  AallUll!  +  <Aa,  A  -  U)  +  ^||A  -  U|||.+ 

Ai||V||1i9  +  (Ar,  r  -  V)  +  ^||r  -  VIII.,  (9) 

where  Aa  and  Ar  are  the  multipliers  of  the  two  linear 
constraints,  and  aA,  ar  are  the  positive  penalty  parameters. 
The  ALM  algorithm  solves  /QrjQ,A(r,  A,  V,  U;  Aa,  Ar)  with 
respect  to  T,  A,U  and  V  jointly,  keeping  Ar  and  Aa  fixed 
and  then  updating  Ar  and  Aa  keeping  the  remaining  variables 
fixed.  Due  to  the  separable  structure  of  the  objective  function 
far,aA,  one  can  further  simplify  the  problem  by  minimizing 
/ar,c«A  with  respect  to  variables  T .  A .  U  and  V,  separately. 
Different  steps  of  the  algorithm  are  given  in  Algorithm  1. 
In  what  follows,  we  describe  each  of  the  suboptimization 
problems  in  detail. 

Algorithm  1:  Alternating  Direction  Method  of  Multipliers 
(ADMM). _ 

Initialize:  To,  Uo,  Vo 5  Aa,o>  Apo*  cka 

While  not  converged  do 

1.  rt+i  =  argminr  far,aA( I\  At,U t,  Vt;  Ar,t,  AA,t) 

2.  At+i  =  argminA/ar,aA(rt+i,  A,  Ut,  Vt;  Ar,t,  AA_t) 

3.  Ut+i  =  argminu  /ar,o:A (Tt+i,  At+i,  U,  Vt ;  Ar,t,  AA,t) 

4.  Vt+i  =  argmmv/ar,aA(rt+i,  At+1,Ut+i,V;  Ar,t,AA,t) 

5-  Ar,t+1  =  Ar,t  +  aA(rt+i  —  Ut+i) 

6.  Aaa+i  =  AA,t  +  ar(rt+i  —  Vt+i) 


1 )  Update  step  for  T.  The  first  suboptimization  problem 
involves  the  minimization  of  /Qr,aA(I\  A,  V,  U;  Aa,  Ap) 
with  respect  to  T.  It  has  the  quadratic  structure,  which  is  easy 
to  solve  by  setting  the  first-order  derivative  equal  to  zero. 
Furthermore,  the  loss  function  C(T,A)  is  a  sum  of  convex 
functions  associated  with  sub-matrices  T',  one  can  seek  for 
TT  1;  i  =  1 , ,D,  which  has  the  following  solution 

r;+1  =  (x'lTx?  +  arI)_1(X,T(Y*  -  X\)  +  arVj  +  A^t), 

where  I  is  p  x  p  identity  matrix  and  A),  V)  and  A lvt  are 
submatrices  of  A4,Vt  and  A y,u  respectively. 

2)  Update  step  for  A:  The  second  suboptimization  problem 
is  similar  in  nature,  whose  solution  is  given  below 

Aj+1  =  (1  +  -  XT*+1  +  aAUj  -  AX,*), 

where  U)  and  X\  t  are  submatrices  of  LI,  and  A A,t,  respec¬ 
tively. 

3 )  Update  step  for  U:  The  third  suboptimization  problem 
is  with  respect  to  U,  which  is  the  standard  t\  minimization 
problem  which  can  be  recast  as 

mini||At+1  +  a^AA ,t  -  U|||.  +  — HUHl  (10) 
u  z  aA 

Equation  (10)  is  the  well-known  shrinkage  problem  whose 
solution  is  given  by 

Ut+i  =  S  ^At+i  +  aA1AAit,  ——  ^  , 

where  5(a,  b)  =  sgn(a)(\a\  —  b )  for  |a|  >  b  and  zero 
otherwise. 


4)  Update  step  for  V:  The  final  suboptimization  problem 
is  with  respect  to  V  and  can  be  reformulated  as 

min  i||rt+1  +  a^Ar.t  -  V|||  +  —  ||V||li9.  (11) 

v  z  ar 

Due  to  the  separable  structure  of  (11),  it  can  be  solved  by 
minimizing  with  respect  to  each  row  of  V  separately.  Let 
t+1,  SLr,i,t  and  v,;  t+i  be  rows  of  matrices  r(+i,  Ar.t  and 
Vt+i,  respectively.  Then  for  each  i  =  1 .....  p  we  solve  the 
following  sub-problem 

Vi, t+i  =argmini||z- v||2  +  ?7||v||g,  (12) 

V  Z 

where  z  =  7i  t+i  —  ar,j, tap1  and  rj  =  One  can  derive  the 
solution  for  (12)  for  any  q.  In  this  paper,  we  only  focus  on 
the  case  when  q  =  2.  The  solution  of  (12)  has  the  following 
form 


where  (v)+  is  a  vector  with  entries  receiving  values 

max(ui,  0). 

Our  proposed  algorithm  Sparse  Multimodal  Biometrics 
Recognition  (SMBR)  is  summarized  in  Algorithm  2.  We 
refer  to  the  robust  method  taking  sparse  error  into  account 
as  SMBR-E  (SMBR  with  error),  and  the  initial  case  where  it 
is  not  taken  account  as  SMBR- WE  (SMBR  without  error). 

Algorithm  2:  Sparse  Multimodal  Biometrics  Recognition 
(SMBR). _ 

Input:  Training  samples  {X;  }lf] ,  test  sample  ,  Occlusion 

basis  {BjfLj 

Procedure:  Obtain  T  and  A  by  solving 
1  D 

T,  A  =  argmin-  y]  HY’-X’r'-B*  AJ||f,+Ai  ||r||i,g+A2 1|  A||r, 

i=  1 

Output: 

identity(Y)  =  arguing  ||Yi  -  Xitfj(f‘)  -  BiA<|||,. 


III.  Quality  based  fusion 

Ideally  a  fusion  mechanism  should  give  more  weights  to 
the  more  reliable  modalities.  Hence,  the  concept  of  quality 
is  important  in  multimodal  fusion.  A  quality  measure  based 
on  sparse  representation  was  introduced  for  faces  in  [10].  To 
decide  whether  a  given  test  sample  has  good  quality  or  not, 
its  Sparsity  Concentration  Index  (SCI)  was  calculated.  Given 
a  coefficient  vector  7  €  K.p,  the  SCI  is  given  as: 

C.  maxi6)1...  ,o>||.l«(,7)l|i  _  ^ 

sen  7)  = - - 

where.  Si  is  the  indicator  function  keeping  the  coefficients 
corresponding  to  the  ith  class  and  setting  others  to  zero.  SCI 
values  close  to  1  correspond  to  the  case  where  the  test  sample 
can  be  represented  well  using  the  samples  of  a  single  class, 
hence  is  of  high  quality.  On  the  other  hand,  samples  with  SCI 
close  to  0  are  not  similar  to  any  of  the  classes,  and  hence  are 
of  poor  quality.  This  can  be  easily  extended  to  the  multimodal 
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case  using  the  joint  sparse  representation  matrix  IT.  In  this 
case,  we  can  define  the  quality,  </*  for  sample  y®.  as: 

q)  =  SC  I  (it) 

~  ~  i 

where,  T*  is  the  j  column  of  T  .  Given  this  quality  measure, 
the  classification  rule  (2)  can  be  modified  to  include  the  quality 
measure. 

D  di 

3  =  argminy^^^||yJfc  -  (13) 

3  i  =  l  k=  1 

where,  Sj  is  the  indicator  function  retaining  the  coefficients 
corresponding  to  jth  class. 

IV.  Kernel  space  multimodal  biometrics 

RECOGNITION 

The  class  identities  in  the  multibiometric  dataset  may  not 
be  linearly  separable.  Hence,  we  also  extend  the  sparse  multi¬ 
modal  fusion  framework  to  kernel  space.  The  kernel  function, 
k  :  R”  x  R",  is  defined  as  the  inner  product 

K{Xi,Xj)  =  (</>(Xi),0(Xj)) 

where,  (j>  is  an  implicit  mapping  projecting  the  vector  x  into 
a  higher  dimensional  space. 

A.  Multivariate  kernel  sparse  representation 

Considering  the  general  case  of  D  modalities  with  { Y'  , 
as  a  set  of  di  observations,  the  feature  space  representation  can 
be  written  as: 

*(Yi)  =  [0(yi),0(y  j),...,0(yi)] 

Similarly,  the  dictionary  of  training  samples  for  modality  i  = 
1,  ■  ■  •  ,D  can  be  represented  in  feature  space  as 

^(X^)  =  [0(X1),  0(X^),  -  -  -  ,0(Xb)] 


B.  Composite  kernel  sparse  representation 

Another  way  to  combine  information  of  different  modalities 
is  through  composite  kernel,  which  efficiently  combines  kernel 
for  each  modality.  The  kernel  combines  both  within  and 
between  similarities  of  different  modalities.  For  two  modalities 
with  the  same  feature  dimension,  the  kernel  matrix  can  be 
constructed  as: 

n(Xi,Xj)  =  ax  k(x*,x))  +  a2K(x-,Xj)  +  a3n(xf ,  x])+ 
a4/t(x-,x2)  (17) 

where,  {a,;},;=i ....  4  are  the  weights  associated  with  the  kernels 
and  Xj  =  [x);x2].  xj  and  x?  are  the  feature  vectors  for 
modality  1  and  2  respecetively.  It  can  be  similarly  extended 
to  multiple  modalities.  However,  the  modalities  may  be  of 
different  dimensions.  In  such  cases,  cross-simlarity  measure 
is  not  possible.  Hence,  the  modalities  are  divided  according 
to  being  homogenous  (e.g.  right  and  left  iris)  or  heteroge¬ 
nous  (fingerprint  and  iris),  as  homogenous  modalities  have 
same  feature  extraction  process  and  hence,  same  dimension. 
This  is  also  reasonable,  because  homogenous  modalities  are 
correlated  at  feature  level  but  heteregenous  modalities  may 
not  be  correlated.  A  composite  kernel  is  defined  for  each 
homogenous  modality.  For  D  modalities,  with  {c/?  }7;es  ,  S:]  C 
{1,  2,  •  •  •  ,  D}  being  the  sets  of  indices  of  each  homogenous 
modality,  the  composite  kernel  for  each  set  will  be  given  as: 


k(X*\X*)  = 

Yh  a*l*2«(Xi1JX^2) 

siS2G<Sfc 

(18) 

=  [xfixf;- 

••  ;x-|Sfcl],  Sk  =  [si,s2,-- 

and  k  =  1,  •  •  •  ,Njj,  Nh  being  the  number  of  different 
heterogenous  modalities.  The  information  from  the  different 
heterogenous  modalities  can  then  be  combined  similar  to  the 
sparse  kernel  fusion  case: 


As  in  joint  linear  space  representation,  we  have: 

$(y?;)  =  #(x?;)r 

where,  F'  is  the  coefficient  matrix  associated  with  modality 
i.  Incorporating  information  from  all  the  sensors,  we  seek  to 
solve  the  following  optimization  problem  similar  to  the  linear 
case: 

1  D 

T  =  argnnn-^  ||$(Y<)  -  $(Xi)ri||f,  +  A||r||li<?  (14) 

where,  T  =  [T1,  T2,  •  •  •  ,  TD].  It  is  clear  that  the  information 
from  all  modalities  are  integrated  via  the  shared  sparsity 
pattern  of  the  matrices  {r®}^_1.  This  can  be  reformulated  in 
terms  of  kernel  matrices  as: 

1  D 

f  =  argmin  -  ^  (trace(r?TKXi,xirz) 

Z  i= 1 

-2trace(KXl,Y!r1)) +A||r||l!g  (15) 

where,  the  kernel  matrix  Ka.b  is  defined  as: 

Ka,b(u')  =  (<Kai)><?Kbj)}  (16) 

a,  and  b;  being  ith  and  jth  columns  of  A  and  B  respectively. 


Nh 

r  =  argmin  -  ^  (trace(rlTKX!.X!rz)- 
^  2=1 

2trace(KXSYiri))+A||r||li?  (19) 

where,  Kx;  x»  is  defined  for  each  Si  as  in  (16)  and  T  = 

[r1^2,--  -  ’r^]. 

C.  Optimization  Algorithm 

Similar  to  the  linear  fusion  method,  we  apply  the  alternating 
direction  method  to  efficiently  solve  the  problem  for  kernel 
fusion.  The  method  splits  the  variable  T  such  that  the  new 
problem  has  two  convex  functions.  This  is  done  by  introducing 
a  new  variable  V  and  reformulating  the  problems  (15)  and  (19) 
as: 

-■  Nk 

argmm-^  (traee^Rx*,^1’4)  -  2trace(Kx,.Yir )) 

’  ^  2=1 

+  A||V||li,s.t.r  =  V  (20) 

where,  Nk  is  the  number  of  kernels  in  (15)  and  (19).  Rewrit¬ 
ing  the  problem  using  Lagrangian  multiplier,  the  optimization 
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becomes: 

1  X*  ,t 

arg mm-  ^  (trace (r1  Kx.,x-rl)  -  2trace(KXiiYir’)) 

’  i=  1 

+  Aiivilr,,  +  (B,r  - v)  +  ^||r  - v||J,  (2i) 

which  upon  re-arranging  becomes: 

1  Nr 

argmm-  ^  (trace(r'TKXl,x,r)  -  2trace(Kx,.Y,r )) 

’  ^  2=1 

+  A||V||1,,  +  ^||r-V  +  ^-B||J,  (22) 

The  optimization  algorithm  is  summarized  in  Algorithm  3. 
Each  of  the  optimization  steps  has  simple  closed-form  expres¬ 
sion. 


Algorithm  3:  Alternating  Direction 
(ADMM)  in  kernel  space. 

Method  of  Multipliers 

Initialize:  To ,  Vo .  Bq  , 

While  not  converged  do 

1-  rt+i  =  argminr  |  ES 

(trace  (rl 

tkx. 

.x-n- 

2trace(KXi  Y»r*))  +A||Vt| 

i  4-  Sw 
T<2  Y  2 

-||r  - 

vt  +  i4Bt!l2F 

2.  Vt+i  =  argminv  A||V||i, 

_L  Pw  I 
9  "h  2  1 

r«+i 

-Vt  +  ^Bt||2F 

3.  Bt+i  =  Bf  +  Bw  (r f +i  - 

Vt+i) 

1)  Update  steps  for  rt:  rt+i  is  obtained  by  updating  each 
submatrix  rj,  i—  1,  ■  •  •  ,  Nk  as: 

rj  =  (Kx*,xi  +/3wI)-1(KXSYi  +(3WV\  -  Bj)  (23) 

where,  I  is  an  identity  matrix  and  V],  BJ  are  sub-matrices 
of  Vt  and  Bt  respectively. 

2)  Update  steps  for  Vt:  The  update  equation  for  V*  is 
same  as  in  the  linear  fusion  case  using  (11)  and  (12),  replacing 
Ar,t  and  ar  with  Bf  and  Bw  respectively. 

D.  Classification 

Once  r  is  obtained  using  any  of  the  two  methods  above, 
classification  can  be  done  by  assigning  the  class  label  as: 

Nk 

J  =  argmin^T  ||*(Y‘)  -  *(X’)f$.|ft 

3  i= l 

or  in  terms  of  kernel  matrices  as: 

Nk 

j  =  arg  min  (trace(KYY)  —  2trace(r*  KXi Yr® ) 

3  i= i  J 

+  trace(ffKx,x,fj))  (24) 

Here,  X*  is  the  sub-dictionary  associated  with  jth  class  and 
r*  is  the  coefficient  matrix  associated  with  this  class. 

The  classification  rule  can  be  further  extended  to  include 
the  quality  measure  as  in  (13).  But,  we  skip  this  step  here, 
as  we  wish  to  study  the  effect  of  kernel  representation  and 
quality  separately. 

Multivariate  Kernel  Sparse  Recognition  (kerSMBR)  and 
Composite  Kernel  Sparse  Recognition  (compSMBR)  algo¬ 
rithms  are  summarized  in  Algorithms  4  and  5,  respectively: 


Algorithm  4:  Kernel  Sparse  Multimodal  Biometrics  Recogni¬ 
tion  (kerSMBR). _ 

Input:  Training  samples  {X;}E_1,  test  sample  {  Y  j }  jf  t 
Procedure:  Obtain  I  by  solving 

1  D 

f  =  arg  min  -  ]T  ||<h(Yl)  -  <t>(Xl)r1|||,  +  A||r||lj9  (25) 

I-  ^  ■  -1 
2=1 

Output:  identity(Y)  =  arg  min.,  Y2iLi  (trace(Kvv)  — 

2trace(f f  KxiYf  j)  +  trace(f*TKxixif*.)) 

J  j  J  J  j  j  J 


Algorithm  5:  Composite  Kernel  Sparse  Multimodal  Biometrics 
Recognition  (compSMBR). _ 

Input:  Training  samples  {X;}E_1,  test  sample  { Y j  }I/B{ 

Procedure:  Obtain  T  by  solving 

1  nh  t 

t  =  arg  min  -  (trace(rl  Kxi  xiT1)- 
1  i=  1 

2trace(KxiYiri))  +A||r||i,9  (26) 

Output:  identity(Y)  =  argminj  fBrJi  (trace(KYY)  — 
2trace(ff  Kx:Yf  j)  +  trace(ff  Kx,x,f‘)) 

J  J  J  J 


V.  Experiments 

We  evaluated  our  algorithm  for  different  multi-biometric 
settings.  We  tested  on  two  publicly  available  datasets  -  the 
WVU  Multimodal  dataset  [27]  and  the  AR  face  dataset  [28]. 
The  WVU  dataset  is  one  of  the  few  publicly  available  datasets 
which  allows  fusion  at  image  level.  It  is  a  challenging  dataset 
having  samples  from  differnent  biometric  modalities  for  each 
subject. 

In  the  second  experiment,  we  show  the  applicability  of  our 
method  to  fusing  information  from  soft  biometrics.  Recently, 
combining  information  from  soft  biometrics  such  as  facial 
marks,  hair  color,  etc  has  been  shown  to  improve  face  recogni¬ 
tion  performance  [29].  The  challenge  for  fusion  algorithms  is 
combine  these  weak  modalities  with  strong  modalities  as  face 
or  fingerprint  [30].  We  demonstrate  that  our  framework  can 
be  extended  to  address  this  problem.  Further,  we  also  show 
the  effect  of  noise  and  occlusion  on  performance  of  different 
algorithms.  In  all  the  experiments  B,  was  set  to  be  identity 
for  convenience,  i.  e. ,  we  assume  noise  to  be  sparse  in  image 
domain. 

A.  WVU  Multimodal  Dataset 

The  WVU  multimodal  dataset  is  a  comprehensive  collection 
of  different  biometric  modalities  such  as  fingerprint,  iris, 
palmprint,  hand  geometry  and  voice  from  subjects  of  different 
age,  gender  and  ethnicity  as  described  in  Table  I.  It  is  a 
challenging  dataset  and  many  of  these  samples  are  corrupted 
with  blur,  occlusion  and  sensor  noise  as  shown  in  Figure  2.  Out 
of  these,  we  chose  iris  and  fingerprint  modalities  for  testing  the 
proposed  algorithms.  In  total,  there  are  2  iris  (right  and  left  iris) 
and  4  fingerprint  modalities.  Also,  the  evaluation  was  done  on 
a  subset  of  219  subjects  having  samples  in  both  modalities. 
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Fig.  2:  Examples  of  challenging  images  from  the  WVU 
Multimodal  dataset.  The  images  above  suffer  from  various 
artifacts  as  sensor  noise,  blur,  occlusion  and  poor  acquisition. 


Biometric  Modality 

#  of  subjects 

#  of  samples 

Iris 

244 

3099 

Fingerprint 

272 

7219 

Palm 

263 

683 

Hand 

217 

3062 

Voice 

274 

714 

TABLE  I:  WVU  Biometric  Data 


1 )  Preprocessing:  Robust  pre-processing  of  images  was 
done  before  feature  extraction.  Iris  images  were  segmented 
following  the  recent  method  proposed  in  [31].  Following  the 
segmentation,  25  x  240  iris  templates  were  formed  by  re¬ 
sampling  using  the  publicly  available  code  of  Masek  et  al. 
[32],  Fingerprint  images  were  enhanced  using  the  filtering 
based  methods  described  in  [33],  and  then  the  core  point  was 
detected  using  the  enhanced  images  [34].  Features  were  then 
extracted  around  the  detected  core  point. 

2)  Feature  Extraction:  Gabor  features  were  extracted  on 
the  processed  images  as  they  have  been  shown  to  give  good 
performance  on  both  fingerprints  [34]  and  iris  [35].  For 
fingerprint  samples,  the  processed  images  were  convolved  with 
Gabor  filters  at  8  different  orientations.  Circular  tesselations 
were  extracted  around  the  core  point  for  all  the  filtered  images 
similar  to  [34],  The  tesselation  consisted  of  15  concentric 
bands,  each  of  width  5  pixels  and  divided  into  30  sectors.  The 
mean  values  for  each  sector  were  concatenated  to  form  the 
feature  vector  of  size  3600  x  1.  Features  for  iris  images  were 
formed  by  convolving  the  templates  with  log-Gabor  filter  at  a 
single  scale,  and  vectorizing  the  template  to  give  a  6000  x  1 
dimensional  feature. 

3)  Experimental  Set-up:  The  dataset  was  randomly  divided 
into  4  training  samples  per  class  (1  sample  here  is  1  data 
sample  each  from  6  modalities)  and  the  rest  519  for  testing. 
The  recognition  result  was  averaged  over  5  runs.  The  proposed 
methods  were  compared  with  state-of-the-art  classification 
methods  such  as  sparse  logistic  regression  (SLR)  [36]  and 
SVM  [37].  Although  these  methods  have  been  shown  to 
give  superior  performance,  they  cannot  handle  multimodal 
data.  One  possible  way  to  handle  multimodal  data  is  to  use 
feature  concatenation.  But,  this  resulted  in  feature  vectors 
of  size  26400  x  1  when  all  six  modalities  are  used,  and  is 
not  useful.  Hence,  we  explored  score-level  and  decision-level 
fusion  methods  for  combining  results  of  individual  modalities. 
For  score-level  fusion,  the  probability  outputs  for  test  sample 
of  each  modality,  {yi}®=1  were  added  together  to  give  the 
final  score  vector.  Classification  was  based  upon  the  final  score 
values.  For  decision-level  fusion,  the  subject  chosen  by  the 
maximum  number  of  modalities  was  taken  to  be  from  the 


correct  class.  We  tested  the  proposed  linear  and  kernel  fusion 
techniques  separately  and  compared  them  with  the  linear  and 
kernel  versions  of  SLR  and  SVM  respectively.  We  denote  the 
score-level  fusion  of  these  methods  as  SLR-Sum  and  SVM- 
Sum,  and  the  decision-level  fusion  as  SLR-Major  and  SVM- 
Major. 

a)  Linear  Fusion:  The  recognition  performances  of 
SMBR-WE  and  SMBR-E  was  compared  with  linear  SVM  and 
linear  SLR  classification  methods.  The  parameter  values  Ai 
and  A2  were  set  to  0.01. 

•  Comparsion  of  Methods:  Figures  3  shows  the  perfor¬ 
mance  on  individual  modalities.  All  the  classifiers  show 
similar  trend.  The  performance  for  all  of  them  are  lower 
on  iris  images  and  fingers  1  and  3.  However,  the  SVM 
fares  poorer  than  other  methods  on  all  the  modalities. 
Figure  4  and  Table  II  show  performance  for  different 
fusion  settings.  The  proposed  SMBR  approach  outper¬ 
forms  existing  classification  techniques.  Both  SMBR-E 
and  SMBR-WE  have  similar  performance,  though  the 
latter  seems  to  give  a  slightly  better  performance.  This 
may  be  due  to  the  penalty  on  the  sparse  error,  though  the 
error  may  not  be  sparse  in  the  image  domain.  Further, 
sum-based  fusion  shows  a  superior  performance  over 
voting-based  methods. 

•  Fusion  with  quality:  Clearly  different  modalities  have  dif¬ 
ferent  quality  of  performance.  Hence,  we  studied  the  ef¬ 
fect  of  the  proposed  quality  measure  on  the  performance 
of  different  methods.  For  a  consistent  comparison,  the 
quality  values  produced  by  SMBR-E  method  was  used  for 
all  the  algorithms.  Table  III  shows  the  performance  for  the 
three  fusion  settings.  The  effect  of  including  the  quality 
measure  while  classification  can  be  studied  by  comparing 
with  Table  II.  Clearly,  the  recognition  rate  increases  for 
all  the  methods  across  the  fusion  settings.  Again  SMBR- 
E  and  SMBR-WE  give  the  best  performances  among  all 
the  methods. 

•  Effect  of  joint  sparsity:  We  also  studied  the  effect  of  joint 
sparsity  constraint  on  the  recognition  performance.  For 
this,  SMBR-WE  algorithm  was  run  for  different  values 
of  Ai .  Figure  5  shows  the  rank  one  recognition  variation 
across  Ai  values  for  different  fusion  settings.  All  the 
curves  show  a  sharp  increase  in  performance  around 
Ai  =  0.  Further,  the  increase  is  more  for  iris  fusion, 
which  shows  around  5%  improvement  at  Ai  =  0.005 
over  Ai  =  0.  This  shows  that  imposing  joint  sparsity 
constraint  is  important  for  fusion.  Moreover,  it  helps  in 
regulating  fusion  performance,  when  reconstruction  error 
alone  is  not  sufficient  to  distinguish  between  different 
classes.  The  performance  is  then  stable  across  Ai  values, 
and  starts  decreasing  slowly  after  reaching  the  optimum 
performance. 

•  Variation  with  number  of  training  samples:  We  varied 
the  number  of  training  samples  and  studied  the  effect  on 
the  top  4  algorithms.  Figure  6  shows  the  variation  for 
fusion  of  all  the  modalities.  It  can  be  seen  that  SMBR- 
WE  and  SMBR-E  are  stable  across  number  of  training 
samples,  whereas  the  performance  of  SLR  and  SVM 
based  methods  fall  sharply.  The  fall  in  performance  of 
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CMC  Curve  for  Individual  Biometrics  with  error  term 
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CMC  Curve  for  Individual  Biometrics  without  error  term 
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CMC  Curve  for  Individual  Biometrics  using  SLR 
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CMC  Curve  for  Individual  Biometrics  using  SVM 


(c)  (d) 

Fig.  3:  CMCs  (Cumulative  Match  Curve)  for  individual  modalities  using  (a)  SMBR-E,  (b)  SMBR-WE,  (c)  SLR  and  (d)  SVM 
methods. 


(a)  (b) 


(c) 

Fig.  4:  CMCs  for  multimodal  fusion  using  (a)  four  fingerprints,  (b)  two  irises  and  (c)  all  modalities. 


SMBR-WE 

SMBR-E 

SLR-Sum 

SLR-Major 

SVM-Sum 

SVM-Major 

4  Fingerprints 

97.9 

97.6 

96.3 

74.2 

90.0 

73.0 

2  Irises 

76.5 

78.2 

72.7 

64.2 

62.8 

49.3 

All  modalities 

98.7 

98.6 

97.6 

84.4 

94.9 

81.3 

TABLE  II:  Rank  one  recognition  performance  for  the  WVU  Multimodal  dataset. 
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SMBR-WE 

SMBR-E 

SLR-Sum 

SLR-Major 

SVM-Sum 

SVM-Major 

4  Fingerprints 

98.2 

98.1 

97.5 

86.3 

93.6 

85.5 

2  Irises 

76.9 

78.8 

74.1 

67.2 

64.3 

51.6 

All  modalities 

98.8 

98.6 

98.2 

93.8 

95.5 

93.3 

TABLE  III:  Rank  one  recognition  performance  using  the  proposed  quality  measure. 


Rank  one  recognition  across  sparsity 


Fig.  5:  Variation  of  recognition  performance  with  different 
values  of  sparsity  constraint,  Ai. 


SLR  and  SVM  can  be  attributed  to  the  discriminative 
approaches  of  these  methods,  as  well  as  score  based 
fusion,  as  the  fusion  further  reduces  recognition  when 
individual  classifiers  are  not  good. 


Rank  one  recognition  across  number  of  training  samples 


Fig.  6:  Variation  of  recognition  performance  with  number  of 
training  samples. 

•  Comparison  with  other  score-based  fusion  methods:  Al¬ 
though  sum-based  fusion  is  a  popular  technique  for  score 
fusion,  some  other  techniques  have  also  been  proposed. 
We  evaluated  the  performance  of  likelihood-based  fusion 
method  proposed  in  [38].  The  results  are  shown  in  Table 
IV.  The  method  does  not  show  good  performance  as  it 
models  score  distribution  as  Gaussian  Mixture  Model. 
However,  it  is  difficult  to  model  score  distribution  due 
to  large  variations  in  data  samples.  The  method  is  also 
affected  by  the  curse  of  dimensionality. 


2  irises 

4  fingerprints 

All  modalities 

SLR-Likelihood 

66.6 

83.5 

75.1 

SVM-Likelihood 

50.7 

31.9 

31.0 

TABLE  IV :  Fusion  performance  with  likelihood-based  method 
[38], 


b)  Kernel  Fusion:  We  further  compared  the  perfor¬ 
mances  of  proposed  kerSMBR  and  compSMBR  with  kernel 
SVM  and  kernel  SLR  methods.  In  the  experiments,  we  used 
Radial  Basis  Function  (RBF)  as  the  kernel,  given  as: 


k(x,;.  Xj  )  =  exp 


Hxi-x.7lliA 

)' 


(j  being  a  parameter  to  control  the  width  of  the  RBF. 

•  Hyperparameter  tuning:  To  fix  the  value  of  hyper¬ 
parameter,  er,  we  iterated  over  different  values  of  a, 
{2— 3 ,  2~2,  ■  •  •  ,  23}  for  one  set  of  training  and  test  split 
of  the  data.  The  value  of  o  giving  the  maximum  perfor¬ 
mance  was  fixed  for  each  modality,  and  the  performance 
was  averaged  over  a  few  iterations.  The  weights  {cty} 
were  set  to  1  for  composite  kernel.  A  and  (3w  were  set 
to  0.01  and  0.01  respectively. 

•  Comparison  of  methods:  Figure  7  shows  the  performance 
of  different  methods  on  individual  modalities,  and  Figure 
8  and  Table  V  on  different  fusion  settings.  Comparison  of 
performance  with  linear  fusion  shows  that  the  proposed 
kerSMBR  significantly  improves  the  performance  on  in¬ 
dividual  iris  modalities  as  well  as  iris  fusion.  The  perfor¬ 
mance  on  fingerprint  modalities  are  similar,  however  the 
fusion  of  all  6  modalities  shows  an  improvement  of  0.4%. 
kerSMBR  also  achieves  the  best  accuracy  among  all 
the  methods  for  different  fusion  settings.  kerSLR  scores 
better  than  kerSVM  in  all  the  cases,  and  it’s  accuracy 
is  close  to  kerSMBR.  The  performance  of  kerSLR  is 
better  than  the  linear  counterpart,  however  kerSVM  does 
not  show  much  improvement.  Composite  kernels  present 
an  interesting  case.  Here,  compSLR  shows  better  perfor¬ 
mance  than  compSMBR  on  each  homogenous  modalities. 
Composite  kernel  by  combining  homogenous  modalities 
into  one,  reduces  effective  number  of  modalities,  hence 
the  size  of  T  matrix  is  reduced.  This  decreases  the 
flexiblity  in  exploiting  different  modality  information  via 
T.  Hence,  the  performance  of  compSMBR  is  not  optimal. 
It  should  be  noted,  however,  we  are  not  comparing  com¬ 
posite  kernels  with  kerSMBR  as  we  have  not  optimized 
over  a  values  for  composite  kernels.  This  is  also  a  major 
limitation  for  composite  kernel  based  methods. 


B.  AR  Face  Dataset 

The  AR  face  dataset  consists  of  faces  with  varying  illumi¬ 
nation,  expression  and  occlusion  conditions,  captured  in  two 
sessions.  We  evaluated  our  algorithms  on  a  set  of  100  users. 
Images  from  the  first  session,  7  for  each  subject  were  used 
as  training  and  the  images  from  the  second  session,  again 
7  per  subject,  were  used  for  testing.  Simple  intensity  values 
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CMC  Curve  for  Individual  Biometrics  using  kernel  SVM 


CMC  Curve  for  Individual  Biometrics  using  kernel  SLR 


(a) 


CMC  Curve  for  Individual  Biometrics  with  kernel  method 


Fig.  7:  CMCs  for  individual  modalities  using  (a)  kernel  SVM,  (b)  kernel  SLR  and  (c)  kerSMBR. 


CMC  Curve  for  fingerprint  fusion 


CMC  Curve  for  iris  fusion 
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Fig.  8:  CMCs  for  different  fusion  methods  for  (a)  four  fingerprints,  (b)  two  irises  and  (c)  all  modalities.  Results  for  composite 
kernels  using  different  techniques  is  shown  in  figure  (d). 


kerSMBR 

kerSLR-Sum 

kerSLR-Major 

kerSVM-Sum 

kerSVM-Major 

compSMBR 

compSLR-Sum 

compSVM-Sum 

4  Fingerprints 

97.9 

96.8 

75.3 

93.2 

71.4 

93.4 

95.7 

81.7 

2  Irises 

84.7 

83.8 

75.2 

62.2 

47.8 

78.9 

78.9 

55.8 

All  modalities 

99.1 

98.9 

87.9 

96.3 

79.5 

95.9 

98.2 

90.4 

TABLE  V :  Rank  one  recognition  performance  for  the  WVU  Multimodal  dataset. 
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were  used  as  features.  For  each  face,  masks  were  applied 
around  eye,  mouth  and  nose  regions,  as  shown  in  Figure  9,  in 
order  to  provide  four  weak  modalities.  These,  along  with  the 
whole  face,  were  taken  for  fusion.  The  experimental  set-up  was 
similar  to  the  previous  section.  The  parameter  values,  Ai  and 
A2  were  set  to  0.003  and  0.002  respectively.  Furthermore,  we 
also  studied  the  effect  of  noise  and  occlusion  on  recognition 
performance. 


Fig.  9:  Face  mask  used  to  crop  out  different  modalities. 

•  Comparison  of  methods:  Table  VI  shows  the  perfor¬ 
mance  of  different  algorithms  on  the  face  dataset.  Clearly, 
SMBR  approach  achieves  about  5  %  improvement  over 
other  techniques.  Here,  SR  (sparse  representation)  shows 
the  classification  result  using  just  the  whole  face.  Eval¬ 
uations  for  kernel  techniques  are  not  shown  as  linear 
kernel  was  found  to  be  performing  the  best.  FDDL 
[39]  is  a  state-of-the-art  discriminative  dictionaries  based 
technique,  but  using  only  single  modality.  Thus,  by 
robustly  classifying  over  multiple  modalities,  we  achieve 
a  remarkable  improvement  over  the  current  benchmark. 
Further,  a  comparison  with  discriminative  methods  as 
SLR  and  SVM  shows  that  they  perform  poorly  compared 
to  the  proposed  method.  This  is  because  weak  modalities 
are  hard  to  discriminate,  hence  score-level  fusion  with 
strong  modality  does  not  improve  performance.  On  the 
other  hand,  by  imposing  reconstruction  and  joint  sparsity 
simultaneously,  the  proposed  method  is  able  to  achieve 
superior  performance. 

•  Effect  of  noise:  In  this  experiment,  test  images  were  cor¬ 
rupted  with  white  Gaussian  noise  of  increasing  variance, 
o2 .  The  comparisons  are  shown  in  Figure  10.  It  can  be 
seen  that  both  SMBR  and  SR  methods  are  stable  with 
noise. 

•  Effect  of  occlusion:  In  this  experiment,  a  randomly  cho¬ 
sen  block  of  the  test  image  was  occluded.  The  recognition 
performance  was  studied  with  increasing  block  size. 
Figure  11  shows  the  performance  of  various  algorithms 
with  block  size.  SMBR-E  is  the  most  stable  among  all 
the  methods  due  to  robust  handling  of  error.  Recognition 
rates  for  others  fall  down  sharply  with  increasing  block 
size. 

•  Quality  based  fusion:  Quality  determination  is  an  impor¬ 
tant  parameter  in  fusion  here,  as  a  strong  modality  is  be¬ 
ing  combined  with  weak  modalities.  We  studied  the  effect 


of  quality  measure  introduced  in  Section  III.  However,  in 
this  case  we  fix  the  quality  for  strong  modality,  viz.  whole 
face  to  be  1,  while  for  the  weak  modalities,  the  SCI  values 
were  taken.  The  recognition  performance  for  SMBR- 
E  and  SMBR- WE  across  different  noise  and  occlusion 
levels  was  studied.  Figure  12  show  the  performance 
comparison  with  the  unweighted  methods.  Using  quality, 
the  recognition  performance  for  SMBR- WE  goes  up  to 
97.4  %  from  96.9  %,  whereas  for  SMBR- WE  it  increases 
to  97  %  from  96  %.  Similarly,  results  improve  across 
different  noise  levels  for  both  methods.  However,  SMBR- 
WE  with  quality  shows  worse  performance  as  block  size 
is  increased.  This  may  be  because  it  does  not  handle 
sparse  error,  hence  the  quality  values  are  not  robust. 


Rank  one  recognition  across  noise  level 


Rank  one  recognition  across  block  size 


Block  size 


(b) 

Fig.  12:  Effect  of  quality  on  recognition  performance  across 
(a)  noise  (b)  random  blocks. 


VI.  Computational  Complexity 

The  proposed  algorithms  are  computationally  efficient.  The 
main  steps  of  the  algorithms  are  the  update  steps  for  T,  A, 
U  and  V.  For  linear  fusion,  the  update  step  for  T  involves 
computing  (X*  X*  +  arI)-1  and  four  matrix  multiplications. 
The  first  term  is  constant  across  iterations  and  can  be  pre¬ 
computed.  Matrix  multiplication  for  two  matrices  of  sizes 
mxn  and  nxp  can  be  done  in  0(mnp)  time.  Hence,  for  given 
the  training  and  test  data,  the  computations  are  linear  in  feature 
dimension.  Hence,  large  feature  dimensions  can  be  efficiently 
handled.  Similarly,  update  step  for  A  involves  matrix  multipli¬ 
cation  XT\  Update  steps  for  U  and  V  involves  only  scalar 
matrix  computations  and  are  very  fast.  Similarly  in  the  kernel 
fusion,  update  for  T  involves  calculating  (Kx*,x’  +  /3vpl)_1, 
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Method 

Recognition  Rate  (%) 

Method 

Recognition  Rate  (%) 

SMBR-WE 

96.9 

Lineai'  SVM-Sum 

86.7 

SMBR-E 

96 

SLR-Sum 

77.9 

SR 

91 

FDDL  [39] 

91.9 

TABLE  VI:  Rank  one  performance  comparison  of  the  proposed  method. 


Rank  one  recognition  rate  across  noise  levels 


Fig.  10:  Effect  of  noise  on  recognition  performance. 


Rank  one  recognition  rate  across  block  sizes 


Fig.  11:  Effect  of  occlusion  on  recognition  performance. 


which  can  be  pre-computed.  Other  steps  are  similar  to  linear 
fusion.  Classification  step  involves  calculating  the  residual 
error  for  each  class,  and  is  efficient. 

VII.  Conclusion 

We  have  proposed  a  novel  joint  sparsity-based  feature  level 
fusion  algorithm  for  multimodal  biometrics  recognition.  The 
algorithm  is  robust  as  it  explicitly  includes  both  noise  and 
occlusion  terms.  An  efficient  algorithm  based  on  alternative  di¬ 
rection  was  proposed  for  solving  the  optimization  problem.  We 
also  proposed  a  multimodal  quality  measure  based  on  sparse 
representation.  Further,  the  algorithm  was  extended  to  handle 
non-linear  variations  through  kernel.  Various  experiments  have 
shown  that  our  method  is  robust  and  significantly  improves  the 
overall  recognition  accuracy. 
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